Time-Series Classification in Many Intrinsic Dimensions
نویسندگان
چکیده
In the context of many data mining tasks, high dimensionality was shown to be able to pose significant problems, commonly referred to as different aspects of the curse of dimensionality. In this paper, we investigate in the time-series domain one aspect of the dimensionality curse called hubness, which refers to the tendency of some instances in a data set to become hubs by being included in unexpectedly many k-nearest neighbor lists of other instances. Through empirical measurements on a large collection of time-series data sets we demonstrate that the hubness phenomenon is caused by high intrinsic dimensionality of time-series data, and shed light on the mechanism through which hubs emerge, focusing on the popular and successful dynamic time warping (DTW) distance. Also, the interaction between hubness and the information provided by class labels is investigated, by considering label matches and mismatches between neighboring time series. Following our findings we formulate a framework for categorizing time-series data sets based on measurements that reflect hubness and the diversity of class labels among nearest neighbors. The framework allows one to assess whether hubness can be successfully used to improve the performance of k-NN classification. Finally, the merits of the framework are demonstrated through experimental evaluation of 1-NN and k-NN classifiers, including a proposed weighting scheme that is designed to make use of hubness information. Our experimental results show that the examined framework, in the majority of cases, is able to correctly reflect the circumstances in which hubness information can effectively be employed in k-NN time-series classification.
منابع مشابه
On The Behavior of Malaysian Equities: Fractal Analysis Approach
Fractal analyzing of continuous processes have recently emerged in literatures in various domains. Existence of long memory in many processes including financial time series have been evidenced via different methodologies in many literatures in past decade, which has inspired many recent literatures on quantifying the fractional Brownian motion (fBm) characteristics of financial time series. Th...
متن کاملClassification of Iranian Contemporary Architecture, Based on Trends and Challenges
The use of demands such as "Iranian-Islamic architecture" or "preservation of Iranian-Islamic identities" appeared in different dimensions and have gradually caused the shape of contemporary Iranian architecture. Many criticisms have been made from various perspectives on the architectural conditions, despite, all of them are worthy of attention, it seems that a required issue has been neglecte...
متن کاملThe Major Determinants of Sustainable Development in Selected Pacific, East and West Asian Countries
Sustainable development is a Controversial concept which has been considered over the three decades. It is comprehensive development and includes all of dimensions as âeconomicââ, âsocialââ and ââenvironmentalââ. In economic objective, it requires substantial economic change that can be brought about by investment and trade. They are effective factors of sustainable developm...
متن کاملOnline Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features
Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...
متن کاملFitting of Count Time Series Models on the Number of Patients Referred to Addiction Treatment Centers in Semnan County
Abstract. Count data over time are observed in many application areas. Many researchers use time series patterns to analyze this data. In this paper, the poisson count time series linear models and negative binomials on this type of data with the explanatory variables are studied. The Likelihood analysis and the evaluation of count time series model based on generalized linear models are pres...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010